Language-Agnostic Reproducible Data Analysis Using Literate Programming

نویسندگان

  • Boris Vassilev
  • Riku Louhimo
  • Elina Ikonen
  • Sampsa Hautaniemi
چکیده

A modern biomedical research project can easily contain hundreds of analysis steps and lack of reproducibility of the analyses has been recognized as a severe issue. While thorough documentation enables reproducibility, the number of analysis programs used can be so large that in reality reproducibility cannot be easily achieved. Literate programming is an approach to present computer programs to human readers. The code is rearranged to follow the logic of the program, and to explain that logic in a natural language. The code executed by the computer is extracted from the literate source code. As such, literate programming is an ideal formalism for systematizing analysis steps in biomedical research. We have developed the reproducible computing tool Lir (literate, reproducible computing) that allows a tool-agnostic approach to biomedical data analysis. We demonstrate the utility of Lir by applying it to a case study. Our aim was to investigate the role of endosomal trafficking regulators to the progression of breast cancer. In this analysis, a variety of tools were combined to interpret the available data: a relational database, standard command-line tools, and a statistical computing environment. The analysis revealed that the lipid transport related genes LAPTM4B and NDRG1 are coamplified in breast cancer patients, and identified genes potentially cooperating with LAPTM4B in breast cancer progression. Our case study demonstrates that with Lir, an array of tools can be combined in the same data analysis to improve efficiency, reproducibility, and ease of understanding. Lir is an open-source software available at github.com/borisvassilev/lir.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

lpEdit: an editor to facilitate reproducible analysis via literate programming

There is evidence to suggest that a surprising proportion of published experiments in science are difficult if not impossible to reproduce. The concepts of data sharing, leaving an audit trail and extensive documentation are fundamental to reproducible research, whether it is in the laboratory or as part of an analysis. In this work, we introduce a tool for documentation that aims to make analy...

متن کامل

Scheme Program Source Code as a Semistructured Data

While traditional literate programming languages utilize a combination of a typesetting language and a programming language and are usually oriented towards a printed paper presentation of a program, the proposed technique combines a programming language with a semistructured markup and is intended for transformation of the Scheme source code into semistructured data that may be transformed or ...

متن کامل

Analysis of Literate Programs from the Viewpoint of Reuse

Donald Knuth created the WEB system for literate programming when he wrote the second version of TEX, a book-quality formatting system. Levy later created CWEB, which is based on Knuth’s WEB using the C programming language and supporting development using the C and C++ programming languages. Krommes’ FWEB is based on CWEB and supports several programming languages. We analyze some parts of the...

متن کامل

Advancements in RNASeqGUI towards a Reproducible Analysis of RNA-Seq Experiments

We present the advancements and novelties recently introduced in RNASeqGUI, a graphical user interface that helps biologists to handle and analyse large data collected in RNA-Seq experiments. This work focuses on the concept of reproducible research and shows how it has been incorporated in RNASeqGUI to provide reproducible (computational) results. The novel version of RNASeqGUI combines graphi...

متن کامل

Algorithms and literate programs for weighted low-rank approximation with missing data

Linear models identification from data with missing values is posed as a weighted low-rank approximation problem with weights related to the missing values equal to zero. Alternating projections and variable projections methods for solving the resulting problem are outlined and implemented in a literate programming style, using Matlab/Octave’s scripting language. The methods are evaluated on sy...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 11  شماره 

صفحات  -

تاریخ انتشار 2016